Coherent Multi-sentence Video Description with Variable Level of Detail

نویسندگان

Anna Rohrbach

Marcus Rohrbach

Wei Qiu

Annemarie Friedrich

Manfred Pinkal

Bernt Schiele

چکیده

Humans can easily describe what they see in a coherent way and at varying level of detail. However, existing approaches for automatic video description are mainly focused on single sentence generation and produce descriptions at a fixed level of detail. In this paper, we address both of these limitations: for a variable level of detail we produce coherent multi-sentence descriptions of complex videos. We follow a two-step approach where we first learn to predict a semantic representation (SR) from video and then generate natural language descriptions from the SR. To produce consistent multi-sentence descriptions, we model across-sentence consistency at the level of the SR by enforcing a consistent topic. We also contribute both to the visual recognition of objects proposing a hand-centric approach as well as to the robust generation of sentences using a word lattice. Human judges rate our multi-sentence descriptions as more readable, correct, and relevant than related work. To understand the difference between more detailed and shorter descriptions, we collect and analyze a video description corpus of three levels of detail.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generation and grounding of natural language descriptions for visual data

Generating natural language descriptions for visual data links computer vision and computational linguistics. Being able to generate a concise and human-readable description of a video is a step towards visual understanding. At the same time, grounding natural language in visual data provides disambiguation for the linguistic concepts, necessary for many applications. This thesis focuses on bot...

متن کامل

Document Context Language Models

Text documents are structured on multiple levels of detail: individual words are related by syntax, and larger units of text are related by discourse structure. Existing language models generally fail to account for discourse structure, but it is crucial if we are to have language models that reward coherence and generate coherent texts. We present and empirically evaluate a set of multi-level ...

متن کامل

A Method to Reduce Effects of Packet Loss in Video Streaming Using Multiple Description Coding

Multiple description (MD) coding has evolved as a promising technique for promoting error resiliency of multimedia system in real-time application programs over error-prone communicational channels. Although multiple description lattice vector quantization (MDCLVQ) is an efficient method for transmitting reliable data in the context of potential error channels, this method doesn’t consider disc...

متن کامل

Subhashini VenugopalanProposal

For most people, watching a brief video and describing what happened (in words) is an easy task. For machines, extracting the meaning from video pixels and generating a sentence description is a very complex problem. The goal of my research is to develop models that can automatically generate natural language (NL) descriptions for events in videos. As a first step, this proposal presents deep r...

متن کامل

A Full-Fuzzy Rate Controller for Variable Bit Rate Video

In this paper, we propose a new full-fuzzy video ratecontrol algorithm (RCA) for variable bit rate (VBR) videoapplications. The proposed RCA provides high qualitycompressed video with a low degree computational complexity.By controlling the quantization parameter (QP) on a picturebasis, it produces VBR video bit streams. The proposed RCAhas been implemented on the JM H.264/AVC video codec andth...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Coherent Multi-sentence Video Description with Variable Level of Detail

نویسندگان

چکیده

منابع مشابه

Generation and grounding of natural language descriptions for visual data

Document Context Language Models

A Method to Reduce Effects of Packet Loss in Video Streaming Using Multiple Description Coding

Subhashini VenugopalanProposal

A Full-Fuzzy Rate Controller for Variable Bit Rate Video

عنوان ژورنال:

اشتراک گذاری